Evaluating the validity of clustering results based on density criteria and multi-representatives
نویسندگان
چکیده
Although the goal of clustering is intuitively compelling and its notion arises in many fields, it has been difficult to define a unified approach to address the clustering problem and thus diverse clustering approaches abound in the research community. These approaches are based on different clustering principles and assumptions and they often lead to qualitatively different results. As a consequence the results of clustering algorithms (i.e. data set partitionings) need to be evaluated as regards their validity based on widely accepted criteria. In this paper a cluster validity index, CDbw, is introduced which assesses compactness and separation of the partitions generated by a clustering algorithm. The cluster validity index, given a data set and a set of clustering algorithms, enables: i) the selection of the input parameter values that lead an algorithm to the best possible partitioning of the data set, and ii) the selection of the algorithm that provides the optimal partitioning of the data set. CDbw handles efficiently arbitrarily shaped clusters by representing each cluster with a number of points rather than by a single representative point. The properties of the validity index are theoretically justified. A full implementation and experimental results confirm the reliability of the validity index showing also that its performance compares favorably to that of several others.
منابع مشابه
Using Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors
Homogeneity of groups in studies those use cross section and multi-level data is important. Most studies in economics especially panel data analysis need some kinds of homogeneity to ensure validity of results. This paper represents the methods known as clustering and homogenization of groups in cross section studies based on enviro-economics components. For this, a sample of 92 countries which...
متن کاملA density-based cluster validity approach using multi-representatives
Although the goal of clustering is intuitively compelling and its notion arises in many fields, it is difficult to define a unified approach to address the clustering problem and thus diverse clustering algorithms abound in the research community. These algorithms, under different clustering assumptions, often lead to qualitatively different results. As a consequence the results of clustering a...
متن کاملA Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کامل